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ABSTRACT 

As the quality of the available galaxy cluster data improves, the models fitted 
to these data might be expected to become increasingly complex. Here we present 
the Bayesian approach to the problem of cluster data modelling: starting from sim- 
ple, physically motivated parameterised functions to describe the cluster's gas density, 
gravitational potential and temperature, we explore the high-dimensional parameter 
spaces with a Markov-Chain Monte-Carlo sampler, and compute the Bayesian evi- 
dence in order to make probabilistic statements about the models tested. In this way 
sufficiently good data will enable the models to be distinguished, enhancing our astro- 
physical understanding; in any case the models may be marginalised over in the correct 
way when estimating global, perhaps cosmological, parameters. In this work we apply 
this methodology to two sets of simulated interferometric Sunyaev-Zel'dovich effect 
and gravitational weak lensing data, corresponding to current and next-generation 
telescopes. We calculate the expected precision on the measurement of the cluster gas 
fraction from such experiments, and investigate the effect of the primordial CMB fluc- 
tuations on their accuracy. We find that data from instruments such as AMI, when 
combined with wide-field ground-based weak lensing data, should allow both cluster 
model selection and estimation of gas fractions to a precision of better than 30 percent 
for a given cluster. 

Key words: methods: data analysis - cosmology: observations - galaxies: clusters: 
general - cosmic microwave background - cosmology: theory - dark matter - gravita- 
tional lensing 



1 INTRODUCTION 

Clusters of galaxies, as the largest gravitationally bound 
structures in the Universe, may be used as cosmological 
probes. The number count of clusters as a function of 
their mass has been predicted both analy tically (see e.g. 
iPress fc Schechteilll974l : ISheth et al.ll200l/) and from large 
scale numerical sim ulations (see e.g!~ ljenkins et abl 1200 it 
lEvrard et aI]|2002^ ■ and are very sensitive to the c osmo- 
logical parameters cts and fBatt ye fc Welleill2003|) . The 
size and formation history of massive clusters is such that 
the ratio of gas mass to total mass is expected to be repre- 
sentative of the universal ratio Q.h/^-m, once the relatively 
small amount of b aryonic matter in t he cluster galaxies is 
taken into account JWhite et al.lll993h . 

The deep gravitational potential wells of clusters 
contain hot ionised gas, which radiates via thermal 
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bremsstrahlung in the X-ray waveband, with a luminosity 
proportional to the projected squared gas density. Inverse 
Compton scattering of cosmic microwave background ra- 
diation (CMB) photons by this gas produces an observ- 
able shift in the thermal spectrum of the CMB in the di- 
rection of the cluster (the Sunyaev-Zel'dovich (SZ) effect, 
ISunvaev fc Zeldovichll973) which is proportional to the pro- 
jected gas density. Comparing the SZ and X-ray flux densi- 
ties allows the angular size distance, and so the Hubble con - 
stant, to be measured iCavaliere. Danese fc de ZottilllQT^l . 

To date, most of our knowledge about clusters has come 
from optical studies of the cluster member galaxies, and from 
X-ray observations of the intra-cluster medium. In particu- 
lar, the recently acquired data from the Chandra and XMM 
missions have allowed the spatial and spectral features of 
the X-ray emission from galaxy clusters to be measured in 
unprecedented detail. This has allowed the measurement of 
the mass function and global cluster gas fra ction to be at- 
tempted J Allen et al.ll2002l: lAUen et al.ll2003ll : however, the 
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methodology requires X-ray data from the most massive, re- 
laxed clusters, and includes assumptions of high symmetry 
and hydrostatic equilibrium. 

It is becoming increasingly common to measure the 
distribution of gas and mass in galaxy clusters by other 
means. Weakly gravitationally lensed images of field galax- 
ies lying behind the cluster are now routinely observed 
and used to inf er the proj ected mass distri bution of the 
cluster (see e.g. iKai^^r fc Squires 1993: Dahle et al 
IClowe fc Schneiderll200a) . Similarly, the SZ effect has been 
observed in many cluste rs (see e.g. Ijones et alJ Il993t 
aL 1996; Ore gQ et al.l2nnih . and has been used 
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with some success in the determination of Hp (see e.g. 
jBirkindiaw fc Hughedlll994 bones et iMason et all 

complete analysis of clusters of galaxies utilising all these 
data at the same time would be desirable. In this paper we 
outline our general approach to this task, and show how our 
methods give natural answers to the specific questions being 
asked of the data. Either Hubble's parameter, or the cluster 
gas fraction, or indeed both, could equally well be chosen as 
the cosmological parameter for the demonstrative purposes 
of this work; for simplicity we focus on the gas fraction, 
setting Ho = lOO/i km s~^. 

The quality of X-ray data in the past, and that of cur- 
rent weak lensing and SZ data, is such that the informa- 
tion content of each dataset individually matches that of 
a simple, symmetric, constrained model, with just a hand- 
ful of parameters. One might expect that combining the 
datasets would lead to an improvement in the parameter 
estimates, and thus open up the possibility of testing more 
complex models. In this work we simulate and compare 
observations made with current g enerat ion SZ telescopes, 
such as t he VS A (Watson et al. l2003l) and CBI (Pear- 
son et al. l2003fl . with t hose under construction s uch a s 
AMI (Kneissl et al. 1200 ifl and the SZA (Mohr et al. 120021) . 
In each case we match the mock SZ observations with sim- 
ulated wide field lensing data, and investigate the prospects 
for measuring the cluster gas fraction via this route. 

In Section |5| we discuss a systematic approach to the 
problem of cluster model parameter fitting and model se- 
lection. We also show in this context how to infer model- 
independent conclusions from cluster data, where the de- 
rived errors on, in particular, any inferred cosmological pa- 
rameters take the uncertainty in the cluster model into ac- 
count. After an introduction to the example cluster data 
and models in Sections El and 0] we apply these methods to 
simulated sets of data from weak gravitational lensing and 
the SZ effect observed by microwave interferometers in Sec- 
tion |S] and discuss the prospects of the aforementioned cur- 
rent and near-future experiments. We then put the results 
of this demonstration in context with a brief discussion of 
our methodology and others (Section|HJ, and summarise our 
conclusions in Section Q 



2 BAYESIAN INFERENCE 

For a set of data arranged into a vector d, our knowledge of 
the experimental errors on those data can be written in the 
form of a likelihood function L: 



which is a function of the observed data and depends on the 
parameter vector 6 of an assumed model or hypothesis Hj. 
By the central limit theorem the likelihood can often be 
approximated by a multivariate Gaussian, 



L = 



1 



|27rC| 



■ exp 



--(d-dp)^C-^(d- 



(2) 



where the predicted data vector dp is calculated within the 
model Hj. However, other forms of the likelihood function 
may describe the data more accurately; none of the rest of 
this section relies on the Gaussian approximation. If the data 
vector can be written as 



d = di + d2 



(3) 



where the two sub-datasets are independent, then we can 
write the joint likelihood as 



Pr(d!6»,i/,) =Pr(di|6»,7yj)Pr(d2l6»,H,), 



(4) 



as has been applied to cluster potential analysis 
bv lCastander et al.l J2000h . 

At this point we can ask the following questions: 

(i) "What are the relative probabilities of any two models 
Hi and Hj being the true model, given all the information 
we have?" 

(ii) "What is the joint probability distribution of the pa- 
rameters 6 of model Hi, given all the information we have?" 

(iii) "What is the probability distribution of any one par- 
ticularly interesting parameter 9k, given all the information 
we have?" 

In the context of galaxy cluster data analysis, a model will 
be a set of suitably parameterised functions describing the 
cluster potential, gas density and temperature, and a suit- 
ably parameterised background cosmology; all of these pa- 
rameters are included in the vector 6 and should be inferred 
simultaneously. Learning the structure of the cluster involves 
finding the most appropriate cluster model and its parame- 
ters; the first two questions above can therefore be bracketed 
together as "astrophysical" questions. To do cosmology with 
clusters one just wants to investigate the cosmological part 
of the parameter space, in a way that is independent of the 
cluster model. In this respect, question (iii) can be labelled 
"cosmological" . 



2.1 Model selection and parameter fitting: 
astrophysics 

The answer to question (ii) above is given by Bayes' theorem: 



Y>r{e\d,H,) = 



Vr{A\e,H,)Vr(e\H, 
Prfdli?,) 



(5) 



L = Pr(d\e, Hi 



(1) 



The probability density function (pdf) Pr{9\Hj) should en- 
code any prior knowledge we have of the parameters of the 
model in question. For example, uncertainty as to the or- 
der of magnitude of a parameter should be represented by 
a uniform probability distribution in the logarithm of the 
parameter tjeffreys 1932) , whilst a previously performed, 
independent measurement of a parameter might be inter- 
preted as a Gaussian prior centred on the observed value 
with width equal to the quoted error. 

The denominator of equation @ plays an important 
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role in answering question (i). Applying Bayes' theorem 
again we have 



Pr(i/,|d) 



Pr(diH,)Pr(H,) 
Pr(d) 



(6) 



Here, the denominator is a constant. Moreover, if, prior to 
the analysis, the various models are assigned equal prob- 
abilities (as might be the case for an attempt at a "fair 
test", or if the analyst really does have no more belief in 
any one model than another), then the quantity Pr(d|_ffj) 
may be used in comparing the relative probailities of each 
model under consideration. For more details on the use of 
this quantity, k nown as the "evidence" fsee e.g. l.Tavnesl2nn3l : 



iMac Kav 1992). Calculation of the evidence is in principle 
straightforward; marginalising out the Ni parameters of the 
model gives 



Pr(d|//, 



Pr(d|6»,_ffj)Pr(0|Hj)d"'6» 



(7) 



However, for the purposes of this work it is useful to note 
that the evidence is sensitive both to the parameter prior 
and the likelihood. Models providing a better fit to the data, 
and so a higher value at the peak of the likelihood func- 
tion, have higher evidence associated with them. Conversely, 
models whose priors define large volumes of low likelihood 
parameter space will give lower evidence values - overly com- 
plex models are penalised. 

We now turn to the practical problems associated with 
the calculation of the posterior probability Pr(0|d,iifj) and 
the associated evidence Pr(d|_H'j). One approach, much used 
in the field of cosmological parameter estimation, is to calcu- 
late the prior density and likelihood on a discrete parameter 
grid. However, as the models considered become more com- 
plex, this task becomes exponentially more computationally 
intensive. As an illustration, if a likelihood evaluation for a 
10 parameter model takes just 1 second, the time to produce 
a grid of posterior probability densities, ten pixels in each 
dimension, is of order 10^" seconds, or 300 years. Clearly 
this procedure is neither practical nor accurate. Instead, we 
use an exploratory Markov-Chain Monte-Carlo (MCMC) al- 
gorithm to draw samples from the multi dimensional unnor - 
malised posterior Pr(d|6>, /fj)Pr(6»|Hj) llCilks et al.lll99et) . 
The output of the MCMC algorithm is a list of samples 
whose number density in parameter space is proportional to 
the posterior probability density. This allows a representa- 
tion of the desired posterior to be constructed which is both 
efficiently calculated (computation time is typically of the 
order of a few hours) and convenient to use. For example, 
the sample parameter values may be combined to calculate 
predictions for other properties of the model. 

Calculating the evidence by performing the integral 
of equation l(7| numerically by ordinary means would of 
course involve a similar number of operations as that out- 
lined above. We instead make use of the technique of "ther- 
modynamic integration" to calculate the evidence dynam- 
ically during an initial "burn-in" phase of the MCMC al- 
gorithm (Gilks et al. 1996). This process results in evidence 
values precise to a fraction of a unit in log^ Pr(dj//j). This 
corresponds to a probability ratio between two models' of 
less than 3, well below the "belief threshold" of a sensible 
analyst. 



2.2 Marginalising over parameters and models: 
cosmology 

A major benefit of working with samples drawn from the 
posterior rather than a grid of posterior values is that the 
process of marginalisation becomes trivial. When estimat- 
ing a single parameter 9i of a model Hj the distribution of 
interest is 



Friend, H J 



(8) 



where the integral is over all other parameters. This pro- 
jection takes into account all the parameter degeneracies of 
the model, as well as the priors on all of the parameters. An 
ensemble of sample 0- vectors drawn from the full posterior 
can be projected onto the 9i direction just by extracting the 
6i values from the ensemble. 

Although computing n-point statistics from a set of 
samples is very easy, reconstructing the marginalised pos- 
terior pdf is not so straightforward. To do this we provide 
as estimators for these distributions histograms of sample 
values smoothed to some arbitrary length scale, such that 
we err on the side of caution and never underestimate the 
distribution widths. This is in keeping with our general ap- 
proach: the marginalised posterior probability distribution 
Pr(Sfc|d) is in general broader than those conditional on 
other parameters being fixed at, for instance, the maximum 
likelihood point. The resulting estimators are therefore as 
accurate as possible, since the maximum amount of infor- 
mation has been included, and as precise as allowed by the 
quality of the data, since the errors on the observed quan- 
tities have been rigorously propagated in a self-consistent 
way. 

By taking this process one step further we can answer 
question (iii), by marginalising over model space. This proce- 
dure is important in the context of cluster data analysis: we 
want to be able to make robust, model-independent state- 
ments about cosmological parameters from cluster data. To 
this end, and denoting the cosmological parameter of inter- 
est 9k, one should calculate 

Pr(efeld) = J2Mdk\d,Hj)Pr{Hj\d) (9) 

J 

(X J2Piiek\d,H,)Pr{d\H,)PT{H,). (10) 

J 

The last proportionality is as such because we have dis- 
carded the normalising constant of equation (|SJ. Equa- 
tion nun is now a relationship between quantities that we 
can calculate, and represents a model-averaging process. We 
might hope that one particular model would be many times 
more probable given the data, in which case we have learnt 
something about the astrophysics of the cluster and the sum 
will be dominated by this model. On the other hand, it is 
always possible to construct a range of models whose param- 
eter priors all match the data's likelihood equally well, and 
so give similar evidences. In this case the averaging process 
serves to increase the width of the posterior density of inter- 
est, that of the interesting parameter 8k given the data only, 
by an amount appropriate to the uncertainty over the range 
of models. In this way, (quasi)-model independent state- 
ments can be made about cosmological parameters, such as 
the cluster gas fraction, from the cluster data. 
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3 DATA 

In this section we give a brief introduction to weak lensing 
and interferometric Sunyaev-Zel'dovich effect data, demon- 
strating the construction of the likelihood function in each 
case. 



3.1 Weak lensing 

Weak gravitational lensing may be used to investigate clus- 
ter mass distributions t hrough the relationship (see e.g. 
ISchramm fc Kavsei<IT99il 



(11) 



that is, the average complex ellipticity e = ei + ie-z of an 
ensemble of background galaxy images is an unbiased esti- 
mator of the local reduced shear field [g) due to the clus- 
ter. Equation allows us to use each of the 2A'gai lensed 
ellipticity components ej of A^'gai measured background 
galaxy images as noisy estimators for the corresponding 
component of the reduced shear QjiO) due to the cluster 
model, parameterised with the va riables 6 tSeitz et al.il99S : 
ISchneider et alJl2000l: [Marshall e t al. 2002|). 

Here, the complex ellipticity is defined such that an el- 
liptical object with semi-major axis a, semi-minor axis b 
and orientation angle (measured anticlockwise from a;-axis 
to semi-major axis) (j) has ellipticity 



a-h 2id 



a + b 



(12) 



Figure 1 of lKaiser et alJ il995l) shows the image shapes cor- 
responding to different values of this (or a similarly de- 
fined) ellipticity; the effect of an isolated mass concentra- 
tion lying in front of a galaxy field is to align the image 
shapes tangentially about the centre of the mass distribu- 
tion. Large catalogues of background galaxies with mea- 
sured ellipticities are now almost routinely generated from 
wide field opt ical images, using shape estimat ion software 
such as imcat iKaiser et alJll^95^ or im2shape iBridle et alJ 
I2OOJ) . 

We may organise the observed ellipticities into the data 
vector d, having components 



di = 



Re(60 (i iVgal) 

Im(e,_jv .) (iVgai 1 s; i sS 2iVgai) 



(13) 



Likewise the corresponding model reduced shears can be ar- 
ranged into the predicted data vector d^, having compo- 
nents 



Re{gi 



{i A^gal) 



Im(5,_iv ) (iVgai + 1 i s; 2iVgai) 



(14) 



The unlensed ellipticity components may often be taken 
as having been drawn independently from a Gaussian dis- 
tribution with mean Qj and variance 0-^^^^^^^^^, leading to a 
diagonal noise covariance matrix C. We can then write the 
likelihood function as 



iLenaing = Pr(Data|0) 

1 , 



(15) 



X_ 
2 



where x is the usual misfit statistic 

JVgal 2 



X 



EE 



= (d-dP)'^C"'(d-dP) 
and the normalisation factor is 

Zi^ = (27r)2^-'/2|Cl^/^ 



(16) 
(17) 

(18) 



The effect of Gaussian errors introduced by the galaxy 
shape estimation procedure has been included by adding 
them in quadrature to the intrinsic elliptic ity disper- 
sion ijHoekstra et al.ll200nl : fMarshaU et al.ll2n02h . 



(19) 

This approximation includes the assumption that the ap- 
plied reduced shear is not too large (as is the case for low 
redshift clusters). 

3.2 Sunyaev Zel'dovich effect 

To date, the majority of the observations of the SZ ef- 
fect towards clusters of galaxies have been made with in- 
terferometers (see e. g. Ijov et al] 1200 it Ijones et al.l 1200 ll : 
iLaRoaue et al.l l2003fl . These instruments have a num- 
ber of advantages over single dish telescopes, including 
their relative insensiti vity to atmospheric emission (e.g. 
iLav fc Halversonll200d) . lack of required receiver stability, 
and the ease with w hich sy stematic errors such as ground 
spill (Watson et al. 2003=) and point source contamina- 
tion (Grainger et al. i2002 . Taylor et al. l200.'^l can be min- 
imised. 

Assuming a small field size, an interferometer operating 
at a single frequency v measures samples from the complex 
visibility plane Iv(u). This is given by the weighted Fourier 
transform of the surface brightness 1^, 



Iv(u) — / Av(x)Iv{x) e.'KX){2'niu ■ x) d^ 



(20) 



where x is the position relative to the phase centre, A(x, v) is 
the (power) primary beam of the antennas at the observing 
frequency 1/ (normalised to unity at its peak), and u is a 
baseline vector in units of wavelength. 

The positions in the uw-plane at which this function is 
sampled by the interferometer are determined by the physi- 
cal positions of its antennas and the direction of the field on 
the sky. The samples Uj lie on a series of curves which we 
may denote by the function Bv(u) that equals unity where 
the Fourier domain (or uu-plane) is sampled and equals zero 
elsewhere. The function (u) may be inverse Fourier trans- 
formed to give the synthesised beam Bi,(x) of the interfer- 
ometer at an observing frequency u. 

For a realistic interferometer, the sample values will also 
contain a contribution due to noise; the jth baseline Uj of 
an interferometer measures the complex visibility 



V{u,)=h{u,) + N{u,), 



(21) 



where N{uj) is the noise on the jth visibility. This noise 
comes from two sources: the first is uncorrelated John- 
son noise from the receivers. The second source of noise is 
the CMB itself, now known to be very well approximated 
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by a Gaussian random field defined by its angular power 
spectrum. That is, expanding the primordial CMB surface 
brightness distribution into a spherical harmonic series gives 



rCMB 



= E E 



leading to the power spectrum 

ce = (\ae 



(22) 



(23) 



For small fields of view, the index £ is related to the Fourier 
(uv) plane vector u by £ ~ 27r|w|. An ideal interferometer 
would therefore measure the airn directly, giving an extra 
noise vector 



(24) 



In practice, the interferometer observes the sky surface 
brightness multiplied by the primary beam Ai,{x); the cor- 
responding convolution with the aperture illumination func- 
tion Ay(u) in the itw-plane produces correlated visibilities. 
Since the SZ effect produces a (frequency dependent) linear 
perturbation SIv to the CMB surface brightness, the noise 
on an SZ observation can therefore be described by the co- 
variance matrix 



C = C 



receiver 



(25) 



The first term on the right hand side is a diagonal ma- 
trix with elements (TiSij, with at the rms Johnson noise 
on the i*'' baseline visibility. The second term contains 
significant off-diagonal elements and can be calculated 
fro m a given primordial CM B power spectrum follow- 
ing iHobson fc Maisingeil (|20o3) • Rather than dealing with 
complex data and covariance matrices, it is convenient to 
order the visibility components into a data vector d with 
components 

C Re{Vi) (i^iVvis) 
d, = < . (26) 

[ Im(V,-jv„J (iVvis -fl < i < 2iVvis) 

With an inversionally symmetric primary beam pattern 
there is no correlation between the real and imaginary parts 
of the visibilities, so with this ordering of the data the ma- 
trix C'^'^'' is block-diagonal. The power spectrum is often 
approximated to be constant within each of a set of bins in 
the spherical harmonic coefficient index I; these bins there- 
fore correspond to annuli in the itu-plane, and the "flat band 
powers" db = {£(£ + l)ci)b are then related to the CMB co- 
variance matrix by an equation of the form 

= dbJij{\u\i,, \u\t+i), (27) 

b 

where the 6*'' bin covers the range \u\b to and the 

integrals Jij take the effect of the the aperture illumination 
function into account. Figure shows the power spectrum 
used in the generation of the covariance matrix Q'^^^ for 
simulated VSA data. 

With the combined covariance matrix C in hand, the 
likelihood function can be written 



Lsz = Pr(Data|6>) 



(28) 



Zl 



exp 



X_ 
2 




2000 



Figure 1. The flat band powers used to create the covariance 
matrix associated with the primary CMB fluctuations on the sky. 
The dotted fines correspond roughly to the range of scales that 
can be probed by the VSA in its present extended configura- 
tion. Overplotted is a theoretical model corresponding to the flat 
ACDM model favoured by the WMAP satellite data. 



where x is again a statistic quantifying the misfit between 
observed data d and predicted data d'', the latter of which 
is a function of the model SZ surface brightness 5/^: 

x' = (d-dP)TC-i(d-dP), (29) 

and the normalisation factor is 

Zi, = (27r)2^™''2|C|'''^ (30) 



3.3 Joint analysis 

As given in equation @, the independent likelihoods de- 
scribed in the previous subsections can be simply combined 
to give the joint likelihood 



log L = log Lsz + log Llc 



(31) 



If the datasets contain systematic errors, then some attempt 
to deal with this ca n be made by applying hyperparameter s 
to the likelihoods jLahav et alll2000l: iHobson et al.ll2002ll : 
this further complication is not considered here. It is suffi- 
cient to note that such a mismatch between the conclusions 
drawn independently from the two datasets should result in 
a joint log evidence smaller than the sum of the two indi- 
vidual log evidence values. 



4 SIMPLE CLUSTER MODELS 

Clearly the sum over models in equation (1101 can be simpli- 
fied by discarding the terms with neglible weight; this cor- 
responds to investigating only physically reasonable models, 
starting with the simplest and gradually increasing the com- 
plexity as required by the data via the evidence. With this in 
mind, and for the illustrative purpose of this work we limit 
ourselves to investigating just two cluster models, referred 
to as "Beta" and "iHSE" . 

Both models are spherically symmetric, with centroids 
assumed to be known to within a Gaussian error of ±1 ar- 
cmin. All projected distributions are then circularly sym- 
metric, with profiles defined as a function of projected radius 
s = ^ (x — xo)^ + {y — yo)^- The three distributions taken 
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to be fundamental in defining the cluster model are taken 
to be the gravitational potential due to the total mass den- 
sity (including dark matter and intracluster medium), the 
density of hot gas in the cluster, and the gas temperature. 



4.1 Mass distributions 

For simplicity, we limit ourselves in this work to a single 
model for the mas s distribution, and choose for th is purpose 
the NFW profile l|Navarro. Frenk fc Whi't3ll99,4 l. This has 
been found to provide a good fit to many numerically simu- 
lated clusters, and is simple enough that analytic formulae 
for many derived distributions have been worked out, as de- 
scribed below. The three-dimensional radial dependence of 
the density is given by 



p(r) = 



(r/rs) (1 -I- r/rs) 



(32) 



The gravitational lensing data are sensitive only to the pro- 
jected total mass density distribution, Ti{x), where x is the 
scaled projected radius x = s/rg. For a circularly symmetric 
surface density, the magnitude of the shear is given by 



171 



E(a;) - T.{x) 



(33) 



(with the overbar denoting average density within x). Here, 
the critical density Ecrit is a factor dependent on the angular 
diameter distances to and between the lens (/) and source (s) 
planes: 



47rG DiDu 



(34) 



The complex reduced shear at background galaxy position 
radius x and azimuthal angle <j) can then be formed from 



9 = 



|7|(3:^) 
1 — hi{x) 



{-cos{24)) - jsin(2<;/))) , 



(35) 



(with the convergence k = E/Ecrit) and provide the pre- 
dicted lensin g data. Analytical formula e for |7|(a;) and n{x) 
are given bv lWright fc BrainerdI j200Cil and are not repro- 
duced here. 

Note that for the NFW model the predicted lensing data 
are most easily parameterised in terms of a scale radius and 
density; however, the prior which we understand best is on 
the total mass, defined to be that within a radius 7-200 such 
that the average density within r2oo is 200 times the critical 
density of the Universe. The overdensity 200 is a value of- 
ten used by workers investigating numerical simulations of 
clusters, a potential source of prior information. Given val- 
ues of M200 and the corresponding concentration parameter 
C200 = '"200 /''s, the scale radius and density can be com- 
puted. 



4.2 Gas distributions 

The SZ data are sensitive only to the cluster gas pres- 
sure, which must be calculated from the model gas density 
and temperature distributions. Both models assume a one- 
parameter isothermal temperature profile. The difference be- 
tween the models comes in the gas density distribution. The 



Beta model has gas density profile 

Pgas(O) 



3)3 ■ 

(1 + [r/r^f) - 



(36) 



often used in cluster modelling (see e.g. ISarazinlll988h . In 
the iHSE model the gas is in full hydrostatic pressure equi- 
librium with the potential defined by the NFW mass model 
(iHSE). This potential is 

GM(r) 



log(H- r/rs)- 



1 + r/rs 



The equation of hydrostatic equilibrium is 

VP = -/5gasV$, 



(37) 
(38) 

(39) 



which, with the assumption of a spherically symmetric dis- 
tribution of ideal gas at temperature T, becomes 



dlogr 



GM{r)fi 
kTr 



(40) 



We assume a mass per particle of jj. — 0.59 times the proton 
mass. Given t he potential of equ ation 13811 . this equation can 
be integrated llSuto et alJll998l) to give 



, AnGpsr^n f log(l + r/rs 
Pgas = Pgas(O) exp — (1 



(41) 



Note again that the predicted data are most easily ex- 
pressed in terms of a central gas density pgasiO), which can 
be computed by numerically integrating either gas density 
profile to r2oo and normalising to the gas mass within this 
radius, a parameter for which again the prior is better un- 
derstood. This integral is performed numerically, as is the 
Abel integral used in projecting the gas pressure distribu- 
tion along the line of sight /, as required in the calculation 
of the Compton j/-parameter: 



vis) = 



UekTdl 



2rpgas(r) 



dr. 



(42) 
(43) 



Predicted visibilities are generated by sampling the (Fast) 
Fourier transform of the 30 GHz sky surface brightness 51^, 
related to the y-parameter by 



f{v)yB,{TcM^), 



(44) 



where Bi,{Tcms) is the CMB blackbody spectrum and fiy) 
is a frequency-dependent factor approximately equal to —2 
at 30 GHz. 

Figure |21 shows the Beta and iHSE model profiles, as 
functions of three-dimensional radius for the gas density pro- 
file (equation Mil *), and of projected radius for the Comp- 
tonisation parameter (equation (1431 ). Finally, the reader 
should note that the Beta profile, free from the hydrostatic 
equilibrium constraint, has two extra parameters, making 
it more fiexible in fitting the data. Moreover, the two gas 
density profiles described here have been chosen to have 
the same method of normalisation, allowing straightforward 
comparison of the two models in fitting the data. This is 
purely a matter of convenience, and is not required by the 
methodology of Section |5| 
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Table 1. Cluster model parameters and priors. Inequalities denote uniform prior probability between the given limits, whilst (a ± b) 
indicates a Gaussian prior centred on a with variance b^. 

Dataset Model Parameters Truth Priors 



Both xo,yo 0.0,0.0 (0 ± 1) arcmin 

Gravitational Both M200 7.0 X lO^^h'^MQ < A^2Oo//i"^M0 < 2 X 10 

lensing C200 4 < C200 < 12 



SZ effect 


Both 




4.9 X lO^^/i-^Mo 


< Mgaa/Zl-^M© < 3 X 10^'' 






T 


8 keV 


(8 ± 2) keV 




Beta 


rc 


200/1-1 kpc 


< rc//i-lkpc < 1000 




only 


P 


0.667 


0.3 < /3 < 1.5 



5 APPLICATION TO SIMULATED DATA 

In this section we use simulated data to demonstrate the 
methods outlined in Section |5| We first focus on simulations 
of current data, namely observations of low redshift clus- 
ters in the optical with MegaCam at CFHT (Marshall et al. 
2003 in preparation), and at 30 GHz with the extended 
VSA (Lancaster et al. 2003 in preparation). We then move 
on to consider data of the quality we might expect in the 
near future, at higher redshift with SZ telescopes such as 
AML This is considered in combination with ground-based 
lensing data from a camera with field of view again well- 
matched to the SZ observations. 

Our strategy is to simulate lensing and SZ data using 
each of the model clusters described in the previous section, 
and then analyse these data as recommended in Section |5| 
assuming in turn both of the models (one of which is the cor- 
rect one). The "true" cluster parameters for each model are 
given in Tabled We use the same model clusters for both 
current and next-generation mock observations, the only dif- 
ference being a shift in redshift from 0.07 (current observa- 
tions) to 0.2 (next-generation observations). Also given in 
Tableware the prior pdfs used in the mock analyses. The 
profiles plotted in Figure |5| correspond to these choices of 
true parameters. 

The lensing data were generated by drawing A'' back- 
ground galaxy ellipticities from a Gaussian intrinsic elliptic- 



ity distribution of width 0.25, lensing them by the calculated 
reduced shear field of the cluster, then adding realistic Gaus- 
sian shape measurement noise with Gohs ~ 0.2. A'^ was speci- 
fied by choosing a source density of 15 ar cmin"^, appropriate 
to a 3-hour ground-based observation iClowe fc Schneider! 
|2Qq3). 

For the mock SZ datasets, the Fourier transform of the 
model cluster's surface brightness was calculated at each of 
the telescope-sampled points in the uw-plane; Gaussian noise 
was then added, drawn from the covariance matrix described 
in Section [3.21 with thermal receiver noise corresponding to 
150 hours' observation. 



5.1 Measuring cluster gas fractions 

With the model gas and total mass density profiles both 
normalised to the respective mass within r2oo, it is straight- 
forward to compute the cluster gas fraction within the same 
radius. Since the observable quantities, the Comptonisation 
parameter y and the reduced shear g, cannot depend on the 
Hubble constant, the gas mass must have units of h~^MQ 
while the units of total mass are ft-^M©. Consequently, the 
gas fraction measurable by combination of weak lensing and 
SZ effect data contains a factor of h: 
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Figure 3. Implied prior probability distribution Pr(/gas/i), as 
represented by an unsmoothed histogram of 50000 MCMC sam- 
ples. 



This is the same /i-dependence as found by lOrego et alJ 
i200ll) when constraining the potential with the assump- 
tion of hydrostatic equiUbrium, but is different from that 
arising from the fitting of X-ray data. Combination of the 
results from lensing and SZ data with those from an anal- 
ysis of X-ray data will thus break the degeneracy between 
the gas fraction and the Hubble parameter; this combination 
may be done equivalently by simultaneously fitting all three 
data sets simultaneously, or by applying the joint posterior 
pdf from the X-ray analysis as a prior for the SZ/lensing 
analysis, or vice versa. 

In this work, we leave the /i-dependence of the gas frac- 
tion as it is, simply using the uniform priors of Table Q on 
Mgas and M200. However, these apparently innocuous pri- 
ors lead to an informative prior on the combination /gas ft. 
This can be seen by sampling the joint p rior w ith no data 
using the MCMC algorithm (Slosar et al. |200^, and com- 
puting Mgas/M2oo for each sample in the same way as 
would be done in the data analysis process. The resulting 
histogram is shown in Figure |3 The effect of the upper 
limit on the gas mass can be seen as the turnover point 
at Mgaa,max/M2oo,maa; = 0.15. Bclow this value, the prior 
is uniform, whereas above it the larger values of fgash are 
increasingly disfavoured, well reflecting our prior knowledge 
of cluster masses. 



5.2 Current generation experiments 

Example model parameter inferences are shown in Fig- 
ure 2] where in each panel the posterior density has been 
marginalised over all but two of the parameter dimensions. 
The effect of the Gaussian prior on the gas temperature can 
be seen in the right-hand panel. The estimated precision on 
each of the three parameters M200, c, and Mgas are 25, 25 
and 50 percent respectively. 

The left-hand panel of Figure |S] shows the cosmological 
results from the joint analysis of the current generation low 
redshift SZ and lensing data. This plot shows the answer to 
question (iii) as posed in Section |21 The joint evidences cal- 
culated for each model, for either given dataset, were equal 
within the numerical errors; this indicates that the model 



averaging equation has two terms to be considered, lead- 
ing to the (quasi) model-independent statement about /gas 
given in the left panel of the figure. 

However, the right-hand panel shows the sensitivity of 
the VSA SZ data to the contaminant primordial CMB fluc- 
tuations: the model-averaged /gas probability distributions 
show significant variation with CMB realisation. Realisa- 
tions 1, 2 and 3 correspond to the situations where the clus- 
ter lies approximately in front of a primordial CMB saddle 
point, a shallow trough and a peak respectively. In the latter 
case the gas mass (and so gas fraction) is then underesti- 
mated. 

The evidence analysis of CMB realisations 2 and 3 show 
neither model being preferred by more than a small factor 
in probability (< 3) regardless of the true cluster model. 
In some cases the primordial CMB is being fitted (by the 
more flexible Beta model) as well as the cluster, and in oth- 
ers the noise is high enough for the Occam factor in the 
evidence to dominate and the simpler iHSE model is pre- 
ferred. However, the extent to which the evidence favours 
either model is never greater than the belief threshold sug- 
gested in Section |5| This indicates that the presence of the 
primordial CMB fluctuations has been dealt with correctly 
- the inconclusiveness of the evidence ratios ensure that the 
conclusions drawn about the structure of the cluster are not 
systematically incorrect. 

However the implication of this result is that in order 
to investigate the astrophysics of low redshift clusters via 
SZ and gravitational lensing (questions (i) and (ii)), more 
information is required. This could take the form of multi- 
frequency SZ observations, to allow better separation of the 
cluster and primordial CMB components (Lancaster et al. 
2003, in preparation), or stronger priors on the cluster pa- 
rameters. Reducing the freedom of the Beta model to flt 
the CMB fluctuations will indeed produce more precise gas 
fraction estimates, but to be confident of the accuracy of 
these numbers the applied priors should be strongly phys- 
ically motivated. A good first step in this direction would 
be to derive joint priors on any model's parameters from a 
large sample of hydrodynamically-simulated clusters. 

5.3 Next generation experiments 

We now move on to consider the kind of observations 
we c an expect from u pcoming S Z telescopes suc h as 
AMI jKrieissl et alJl200lD . the SZA llMohr et al.ll2002D and 
AMiBA iLo et al.ll2000l) in combination with matched lens- 
ing observations from wide field optical cameras. Rather 
than compare the different experiments, we note that they 
are qualitatively similar instruments and proceed to use 
AMI, and for the mock lensing observations the ESO Wide 
Field Imager, as specific examples in this work. AMI will 
consist of (a) ten close-packed 3.7-m dishes operating at 
15GHz and (b) the eight 13-m dishes of the current Ryle 
Telescope (which are separated by longer baselines). The 
two parts of this array will have different correlators and 
therefore provide two independent measurements of the sky, 
allowing the two datasets to be combined by a simple sum 
of the individual log-likelihoods. 

An important question is that of the strength of the 
primordial CMB on the angular scales to which AMI is 
sensitive. These correspond to a maximum Z-range of 1000 
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5x10^^ 10^^ 1.5x10^^ 2x10^^ 5 10 15 

M200 / T / keV 

Figure 4. Marginalised probability distributions of cluster astrophysical parameters from mock current experiments. Left: 
Pr(M200i c| Data , iHSE); Right: Pr(T, Mgas | Data , iHSE). MCMC samples are shown as points, overlaid on their minimally-smoothed 
histogram (marked with contours enclosing 68 and 90% of the posterior probability. The white star shows the true parameter values. 




Figure 5. Marginalised probability distributions of cluster cosmological parameters from mock current experiments. Left: Marginalised 
probability distributions of /gas?*; the true cluster model is iHSE. Pr(/gas?i| Data , Beta) and Pr{/gas/i| Data , iHSE) are plotted with 
full and dashed lines respectively; the dotted curve shows the result of model-averaging, Pr(/gas/i| Data). Right: the effect of different 
CMB realisations on the model-averaged inferences. In both plots the true value of fgush is shown by the dark solid vertical line. 



to 6600 for AMI'S small dishes, and 6000 to 36000 for the 
large Ryle Telescope dishes; note that in practice the min- 
imum £ values used in observations will be significantly 
greater. The primordial CMB power spectrum is relatively 
poorly kno wn in this region, w ith only the measurements 
by ACBAR jRunvan et alj|2003t) extend ing to £ of 2500 and 
CBI (Pearson et al. l2003t Mason et al. l2003h extending to 
£ of 3500. The CBI group find an excess of power at the 
higher of these wavenumbers, which has been interpreted 
as being due to the integrated SZ effect of the la rge-scale 
structure along the line of sight (Bond et al. l2003l ). For the 



purposes of this work we make two assumptions: first, the 
primordial CMB can be neglected for the Ryle observations 
at £ > 6000, and second that extrapolating the power spec- 
trum of Figure0to this £ limit gives a reasonable estimate of 
the amplitude of the primordial fluctuations on larger scales. 
For comparison with the latter we also simulated short base- 
line data with no contribution to the noise from the CMB. 

Example model parameter inferences from this quality 
data are shown in Figure El where in each panel the poste- 
rior density has been marginalised over all but two of the 
parameter dimensions. The effect of the Gaussian prior on 
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Figure 6. Marginalised probability distributions of cluster astrophysical parameters from mock next-generation experiments. Left: 
Pr(M2oo, c| Data , iHSE); Right: Pr(T, Mgas | Data , iHSE). MCMC samples are shown as points, overlaid on their minimally-smoothed 
histogram (marked with contours enclosing 68 and 90% of the posterior probability. The white star shows the true parameter values. 



o 




'Beta" mode 
'iHSE" mode 
Model-overoged 



O 



















— — — Realisation 2 


i 






i 


\ 




i 1 
)■ / 


\ 
"A 




'•■ / 


i \ 




'■' / 


i \ 




'■'/ 


\ \ 




'■'/ 


I \ 




'•■/ 


\ \ 




'7 


\\ \ 




V 













f h 
gas 



f h 
gas 



Figure 7. Marginalised probability distributions of cluster cosmological parameters from mock next-generation experiments. Left: 
Marginalised probability distributions of /gas?*; the true cluster model is iHSE. Pr(/gas/i| Data , Beta) and Pr(/gas/i| Data , iHSE) are 
plotted with solid and dashed lines respectively; the dotted curve shows the result of model-averaging, Pr(/gas^| Data), and is indistin- 
guishable from the (correctly) selected model curve. Right: the effect of different CMB realisations on the model-averaged inferences. In 
both plots the true value of fgaah is shown by the dark solid vertical line. 



the gas temperature can be seen in the right-hand panel. 
The estimated model-dependent precisions on each of the 
three parameters M2qo,c and Mgas are 20, 20, and 12 per- 
cent respectively. The improvement in the results from the 
lensing data is slight, the increased lensing strength being 
balanced by the decreased number of observed background 
galaxies due to the smaller field of view. The improvement in 
gas mass estimation is more marked, due to a combination 
of reduced primordial CMB at the higher Z-values, and the 
more comprehensive uu-plane coverage afforded by AMI. 



Table |21 gives the evidence ratios for the experiments 
outline above. In the case where no contaminant primordial 
CMB is present the most probable model matches that used 
in simulating the data. The factor by which the true model 
is more likely is larger when the iHSE model is used: this 
model is both simpler, and it provides a better fit. The same 
is true when the primordial fiuctuations are present, with 
suitably (but not greatly) reduced evidence ratios, with the 
variation in the evidence ratios being due to differing noise 
realisations. We again see the sensitivity to the noise realisa- 
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Table 2. Evidence ratios from the joint analysis of the 
"next-generation" mock observations; this quantity is given 
by Pr( Data [correct model)/Pr( Data I incorrect model) and so 
should be greater than unity if accurate astrophysical conclusions 
are to be drawn. 

CMB included? Model Evidence ratio (CMB real"' 1, 2, 3) 

Yes Beta 1.5, 81, 2.5 

iHSE 110, 110, 37 

No Beta 20 

iHSE 600 



best-fit point: tiiis is clearly not an accurate representation 
of our prior knowledge . The fitting of SZ visibility data 
by iGrego et alJ i200 j) . and the joint maximum likelihood 
analysis of SZ and X-ray data oflRecsc ot al. ( 200^ are two 
other examples of a small number of parameters being fitted 
within the context of a single cluster model; one of the aims 
of this work was to provide a complete framework which 
combined and extended analyses such as these. 

Other methods proposed for use in the joint analysis 
of cluster data have, to date, b e en foc used on "parameter- 
free" reconstruction. iReblinskvl (|2I300') suggests an iterative 
procedure for refining the cluster potential using X-ray, SZ 
and weak lensing images, whilst IZaroubi et alJ (l200lll pro- 
vide a direct inversion method to take these images and 
produce a three-dimensional cluster model, making use of 
some attractive features of working in the Fourier domain. 
Dorc et al. ( 20011*) suggest using SZ and weak lensing data 
to constrain successive orders of perturbation from spherical 
symmetry, again working from ready-made maps. All these 
methods have been shown to work well with noiseless input. 
However, those working with the observations have opted 
to increase the complexity of their cluster models in a more 
gradual way, for example fitting an X-ray map with a sim- 
ple model and using this mo del to predict the observed SZ 
effect fe.g. lJones et al.(l200j) . The method described here is 
this common sense reduced to calculation - the use of sim- 
ple functions for the potential, gas density and temperature 
allows the model complexity to be tuned to the data qual- 
ity via the evidence. The "parameter-free" methods referred 
to above actually have many parameters, usually the values 
of pixels in a grid: direct methods will produce one set of 
parameters, but these may not be the most probable given 
the data, or even the most appropriate given the parame- 
ter degeneracies. When trying to measure quantities such as 
the gas fraction, all the cluster configurations allowed by the 
data should be accounted for, in order to calculate an ac- 
curate confidence interval; only by fully exploring a model's 
parameter space this can be achieved. Indeed, consideration 
of a range of models is then desirable to gain the next level of 
accuracy, one that is model-independent in the sense of the 
discussion in Section |5| Sampling from a model's parameter 
space produces the set of cluster configurations permitted 
by the data, which is arguably more useful than the unique 
solutions generated by direct methods. 

One might wonder how the MCMC technique endorsed 
in this work would perform if used in the many-parameter 
modelling of the type mentioned above. Indeed, in this con- 
text the number of parameters included here is rather small. 
The evidence itself is a guide in the development of these 
methods, providing a handle on the information content of 
the data: if the evidence does not favour a triaxial ellipsoid 
over a spherical model, then it might reasonably be assumed 
not to favour a "parameter-free" representation. The ability 
of a sampler to cope with increasing numbers of parame- 
ters is somewhat sensitive to the shape of the posterior dis- 
tribution under investigation; we find that including extra 
nuisance parameters (the parameters of point sources con- 
taminating the SZ data for example) does not affect the ac- 
curacy of the posterior exploration (Lancaster et al. 2003 in 
preparation), and neither does increasing the number of sub- 
clumps when modelli ng gravitational le nses with multiple 
mass concentrations teneib et alJl2003l) . Moving to three- 



tion, with confident model selection possible in only 4 out of 
the 6 analyses. For the Beta model cluster, the same balance 
between model complexity and goodness of fit is seen as in 
Section [5.21 suggesting the need for informative priors even 
for this higher quality data. 

Figure [7| shows the cosmological inference drawn from 
the analysis of the mock iHSE cluster. The model-averaged 
fgash probability distributions are dominated by the iHSE 
model contribution - this astrophysical model has been suc- 
cessfully selected by the evidence. Taking the median sam- 
ple as an estimator for the gas fraction we find /gas ft = 
(0.06llo:oi5)- With no CMB contamination this estimate 
changes to /gas ft = (0.054lg Qj^3), an increase in precision of 
just 5% (from 28 to 23%). This is an indication of the small 
contribution to the error budget that the primordial CMB 
has at these angular scales. Indeed, as previously mentioned 
the shortest AMI baseline will be longer than that used in 
this work, such that the effect of the primordial CMB will be 
reduced; we might therefore expect results lying inbetween 
the two situations simulated here. 



6 DISCUSSION 

The analysis of the simulated data presented in the previ- 
ous section was designed to be a simple demonstration of a 
general methodology. In the current section we discuss the 
advantages and disadvantages of our approach, and its abil- 
ity to be extended, beginning with some comparison with 
other methods currently in use. 

The Bayesian method described here can be seen as a 
generalisation of the model fitting procedures employed by 
other workers. For example. King et al. (2002) fitted lens 
models with 1, 2 and 3 parameters to weak shear data for 
Abell 1689 by the maximum likelihood method; they com- 
pute likelihood contours for the parameter uncertainties and 
compare the models' goodness of fit with the likelihood ratio 
test. As explained in Section |5| such a grid-based computa- 
tion is not practical with the 6-8 parameters used here. In- 
deed, numerical maximisation of a function of 6-8 variables 
is already a demanding problem, especially when there are 
multiple maxima to be investigated. Comparing models by 
their maximum likelihoods is also rather sensitive to noise 
features in the data, rather than assessing the relative appro- 
priateness of the models to the task of explaining the data. 
The maximum likelihood ratio is formally equivalent to the 
evidence ratio when the prior pdf is a delta-function at the 
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dimensional cluster modelling introd uces a number of strong 
degeneracies in the parameter space JFox fc Penl2002l) which 
may well require a tailor-made MCMC sampler instead of 
the general purpose engine used here. Such a sampler might 
be expected to reduce the run-time of the method, which 
(compared to a direct inversion or a downhill simplex 
minimisation) is its major disadvantage: to produce each 
posterior distribution shown in the figures of this paper a 
computation time of several hours with a 1 GHz processor 
was required. The evidence calculation takes rather longer, 
since the numerical precision comes from repeated posterior 
explorations. The c omputation time scales approximately as 
A^data X A^paramotors dCilks et al.ll99dl . prompting careful de- 
sign of the sampler and likelihood calculation. However, the 
alternatives for coping with noisy data can be just as time 
consuming; for instance resampling of the data to generate 
confidence limits (e.g. Allen ct al. 2003) effectively performs 
the same calculations as the MCMC process, but with more 
limited output. Nevertheless, it is the computational cost of 
the method that is perhaps the most urgent aspect to be 
addressed in further work. 



7 CONCLUSIONS 

We have developed an algorithm based on the Markov-Chain 
Monte-Carlo technique for investigating simple but many- 
parameter models of clusters. The method allows straight- 
forward inclusion of many datasets, correctly weighting each 
datapoint according to its assumed likelihood; by exploring 
the posterior probability distribution rather than the like- 
lihood we can incorporate information on the cluster from 
other sources via the parameter prior densities. Calculation 
of the Bayesian evidence by thermodynamic integration dur- 
ing the burn-in period can be done to sufficient accuracy to 
allow different astrophysical models to be compared; this 
statistic automatically includes the common-sense of Oc- 
cam's razor, allowing movement away from the simplest as- 
sumed models only when the data require it. 

Applying the method to simulated weak gravitational 
lensing and interferometric Sunyaev-Zel'dovich effect data 
we draw the following conclusions: 

• Gravitational weak lensing data allow cluster mass dis- 
tribution model parameters to be estimated with a precision 
of around 20 percent over the small range of low redshifts 
discussed here. 

• Primordial CMB anisotropies contaminate SZ observa- 
tions on angular scales less than / ~ fOOO. However, the 
nature of the primordial fluctuations allows them to be prop- 
erly accounted for in the likelihood function, preventing in- 
accurate conclusions about either the cluster model or the 
gas fraction. The available precision on the cluster gas mass 
increases by a factor of approximately two (to around 10 
percent) when moving from instruments such as the VSA to 
those more like AMI. 

• The behaviour of the Bayesian evidence as a model se- 
lection tool can be understood in terms of both goodness 
of fit and model complexity; in the case of the primordial 
CMB contaminated data, the more flexible fitting formu- 
lae are sometimes favoured by the evidence as they pro- 
vide a better fit to all the data, prompting the need for 
improved prior constraints on the model parameters. One 



recommended source of these priors is a sample of numeri- 
cally simulated clusters. 

• Where the SZ data are less contaminated by the pri- 
mordial CMB the evidence does indeed allow successful as- 
trophysical model selection, leading to accurate conclusions 
about the dynamical state of the cluster under observation. 

• Where the evidences for a range of models are of com- 
parable size, a correctly weighted average may be taken, re- 
sulting in appropriately precise inferred uncertainties on the 
parameter in question - for the gas fraction /gas ft we may 
expect a model-independent uncertainty on this parameter 
of around 50 percent for a low redshift cluster, and under 
30 percent with observations of a cluster with AMI. 

This last point is one worth returning to; under the 
assumption of a Universal cluster gas fraction the model- 
independent inference for each independent member of a 
sample of clusters can be combined by straightforward 
multiplication of their posterior probability distributions, 
reducing the uncertainty on this parameter by a factor of 
approximately ^/N . 

The work described in this paper is straightforwardly 
extendable to incorporate X-ray, and indeed any other, clus- 
ter data. Similarly, investigating more complex cluster mod- 
els by relaxing the assumptions of isothermality, spherical 
symmetry and a single cluster potential is easy to do, with 
the evidence providing the self-consistent and logical way 
through the astrophysical model analysis. The obvious im- 
mediate next step to take is the inclusion of the X-ray data, 
and the opening up of another dimension of cosmological 
parameter space, that of the Hubble parameter: this anal- 
ysis applied to clusters observed with the VSA will be the 
subject of future publications. 
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